Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RelayMiner]: add proxy.Ping(...) capability to test connectivity between relay servers and backend URLs #1037

Open
wants to merge 56 commits into
base: main
Choose a base branch
from

Conversation

eddyzags
Copy link

Summary

This PR adds the capability to test the connectivity between the Relay Servers and the Backend URLs in two ways.

  1. Safeguard at Startup:
    For every suppliers.[].service_config.backend_url referenced as input inside the Relay Miner Configuration file, the Relay Proxy will verify wether the network connection between the targeted backend_url and the relayerminer process is functioning properly. If one or more connections aren't possible, the relay miner won't be able to start.

  2. Configurable Ping HTTP server:
    The Relay Miner process will listen for incoming request to synchronously test the connectivity of every referenced suppliers.[].service_config.backend_url. If one or more backend URLs aren't reachable, the incoming request will fail.

Based on the serverConfig.ServerType (Example: HTTP), each Server Type will implement their own logic to implement to test the connectivity.

Issue

Type of change

Select one or more:

  • New feature, functionality or library
  • Bug fix
  • Code health or cleanup
  • Documentation
  • Other (specify)

Testing

Documentation changes (only if making doc changes)

  • make docusaurus_start; only needed if you make doc changes

Local Testing (only if making code changes)

  • Unit Tests: make go_develop_and_test
  • LocalNet E2E Tests: make test_e2e
  • See quickstart guide for instructions

PR Testing (only if making code changes)

  • DevNet E2E Tests: Add the devnet-test-e2e label to the PR.
    • THIS IS VERY EXPENSIVE, so only do it after all the reviews are complete.
    • Optionally run make trigger_ci if you want to re-trigger tests without any code changes
    • If tests fail, try re-running failed tests only using the GitHub UI as shown here

Sanity Checklist

  • I have tested my changes using the available tooling
  • I have commented my code
  • I have performed a self-review of my own code; both comments & source code
  • I create and reference any new tickets, if applicable
  • I have left TODOs throughout the codebase, if applicable

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a new configuration section for the ping functionality, allowing users to test backend connectivity within the relay miner's setup.
    • Added methods to handle ping requests, enhancing health check capabilities for relay servers.
  • Bug Fixes

    • Improved error handling during the server startup process if any relay server is unreachable.
  • Tests

    • Added tests for the new ping functionality to ensure operational integrity and reliability of the relay miner.

@Olshansk Olshansk added tooling Tooling - CLI, scripts, helpers, off-chain, etc... community A ticket intended to potentially be picked up by a community member nice-to-have Not-important and not-urgent labels Jan 22, 2025
@Olshansk Olshansk added this to the Beta TestNet Iteration milestone Jan 22, 2025
Copy link
Contributor

@bryanchriswhite bryanchriswhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Thanks for picking this back up @eddyzags! 🙌

I have to stop here for today but this is looking great so far! 🚀
The biggest thing I haven't reviewed yet is the test (but I already saw the addition of go-mockdns, and I skimmed the test names 😉) and am looking forward to it.

Copy link
Contributor

@bryanchriswhite bryanchriswhite Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change intentionally persisted, and if so, how is it related to this feature?

I think this change should be reverted. My assumption is that this is the result of an older commit which was never reconciled completely with main:

  1. The yaml files referenced don't exist.
  2. The flags seem to be specifying the same/similar config as what's been removed from the relayminer configs that do exist. 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn't clear in my previous comments.

Was this change intentionally persisted, and if so, how is it related to this feature?

Yes, this change was intentionally made to ensure the Ping safeguard at startup succeeds for the Relayminer with the localnet default configuration, and/or any custom localnet configuration in that regard (link to localnet default configuration in the main branch). In the default localnet configuration, the Ollama Kubernetes deployment is not applied (ollama.enabled=false). However, the relayminer configuration still referenced Ollama suppliers in its configuration files, even though the container wasn’t deployed (link to relayminer-1 configuration for localnet). With the newly introduced mechanism of the Ping safeguard at startup, this will cause the relayminer to fail continuously because the Ollama container isn't deployed.

To solve this issue, I found a way to dynamically define the relayminer's configuration based on the localnet configuration by modifying the poktrolld/Tiltfile. Hence, those modifications.

For poktrolld users that are deploying a Relayminer without relying on the localnet, they will have to make sure that their config.suppliers[*].service_config.backend_url are up and running and reachable before deploying a Relayminer.

The yaml files referenced don't exist.

I disagree, they exists:

The flags seem to be specifying the same/similar config as what's been removed from the relayminer configs that do exist.

I cannot find that. Can you link me to the precise line in my fork that makes you think that please? 🙏🏾

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyzags thanks for the detailed response here! 🙌

  1. The yaml files referenced don't exist.

I was referring to .yaml files referenced in this commit, but I also see that they're not referenced any more. I just didn't understand the rationale behind moving the config into the Tiltfile.

(@okdas @red-0ne thoughts?)

The flags seem to be specifying the same/similar config as what's been removed from the relayminer configs that do exist.

I was just pointing out that the config fields which you've removed from the relayminer configs correspond to the flags you've added in the Tiltfile. The point being, to question why should we prefer to provide the config via flags over the yaml file, which you answered.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, Bryan; I am glad it was clear. Waiting for @okdas and @red-0ne feedback. I am open to suggestions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyzags This LGTM but #PUC in the code with your explanation related to Ping safeguard.

You already have it written down anyhow :)

Tiltfile Outdated Show resolved Hide resolved
localnet/kubernetes/values-relayminer-1.yaml Show resolved Hide resolved
docusaurus/docs/operate/configs/relayminer_config.md Outdated Show resolved Hide resolved
pkg/relayer/proxy/synchronous.go Outdated Show resolved Hide resolved
pkg/relayer/relayminer_test.go Outdated Show resolved Hide resolved
pkg/relayer/relayminer_test.go Outdated Show resolved Hide resolved
pkg/relayer/relayminer_test.go Outdated Show resolved Hide resolved
pkg/relayer/relayminer_test.go Outdated Show resolved Hide resolved
server.Handler = http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
sendJSONRPCResponse(test.t, w)
})
listener, err := net.Listen("tcp", supplierConfig.ServiceConfig.BackendUrl.Host)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why separate the listener from the server?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using a custom listener, and thereby decoupling the listener from the serve action, we ensure that the HTTP server is fully prepared to listen on a specific port in the test's main Go routine. This guarantees that the HTTP server(s) is ready before proceeding to the actual test cases.

Previously, listening and serving were handled within the Go routine using http.ListenAndServe function. This approach sometimes led to the HTTP server not being ready when the test cases began execution, resulting in test failures and flaky behavior.

Copy link
Contributor

@bryanchriswhite bryanchriswhite Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! 👍 #PUC with that explanation, perhaps condensed, if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#PUC

@eddyzags
Copy link
Author

Thanks for reviewing @bryanchriswhite ! Waiting for the rest of the review 🚀

Copy link
Contributor

@bryanchriswhite bryanchriswhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @eddyzags! 🙌 🚀

I had a few more suggestions and I would like to get some feedback from @okdas and @red-0ne, but otherwise I think this is just about ready to 🚢. 😎

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyzags thanks for the detailed response here! 🙌

  1. The yaml files referenced don't exist.

I was referring to .yaml files referenced in this commit, but I also see that they're not referenced any more. I just didn't understand the rationale behind moving the config into the Tiltfile.

(@okdas @red-0ne thoughts?)

The flags seem to be specifying the same/similar config as what's been removed from the relayminer configs that do exist.

I was just pointing out that the config fields which you've removed from the relayminer configs correspond to the flags you've added in the Tiltfile. The point being, to question why should we prefer to provide the config via flags over the yaml file, which you answered.

localnet/kubernetes/values-relayminer-1.yaml Show resolved Hide resolved
pkg/relayer/proxy/proxy_test.go Outdated Show resolved Hide resolved
pkg/relayer/proxy/proxy_test.go Outdated Show resolved Hide resolved
Comment on lines 699 to 702
relayProxyBehavior := append(t.relayerProxyBehavior, []func(*testproxy.TestBehavior){
testproxy.WithDefaultSupplier(t.supplierOperatorPingAllKeyName, supplierEndpoints),
testproxy.WithServicesConfigMap(servicesConfigMap),
}...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#PUC options provided via multiple calls to testproxy.WithDefaultSupplier(...) and testproxy.WithServicesConfigMap(...) are not mutually exclusive; the former accumulates service endpoints into a testutil global variable, and the latter starts an http server for each service in the given services config map.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I'm not sure I understand this. I aimed to define two relay servers with their own suppliers and services managed by the same relay proxy. Then call PingAll to see if the mechanism is working across multiple relay servers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored this part by using testproxy.WithDefaultSupplier(...) & testproxy.WithServicesConfigMap(...) one time only. eddyzags@598f85e

return nil
}

func (rel *relayMiner) newPinghandlerFn(ctx context.Context, ln net.Listener) http.HandlerFunc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the ln variable is unused.

Suggested change
func (rel *relayMiner) newPinghandlerFn(ctx context.Context, ln net.Listener) http.HandlerFunc {
func (rel *relayMiner) newPinghandlerFn(ctx context.Context) http.HandlerFunc {

// ping requests. A single ping request on the relay server broadcasts a
// ping to all backing services/data nodes.
go func() {
if err := http.Serve(ln, rel.newPinghandlerFn(ctx, ln)); err != nil && !errors.Is(http.ErrServerClosed, err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err := http.Serve(ln, rel.newPinghandlerFn(ctx, ln)); err != nil && !errors.Is(http.ErrServerClosed, err) {
if err := http.Serve(ln, rel.newPinghandlerFn(ctx)); err != nil && !errors.Is(http.ErrServerClosed, err) {

pkg/relayer/relayminer.go Show resolved Hide resolved
@@ -57,3 +60,69 @@ func TestRelayMiner_StartAndStop(t *testing.T) {
err = relayminer.Stop(ctx)
require.NoError(t, err)
}

func TestRelayMiner_Ping(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding one more test for the error cases where the relayer proxy mock's #PingAll() returns the temporary and non-temporary *url.Errors such that we can assert (and cover regression of) the resulting error returned from the HTTP GET on the ping endpoint?

Copy link
Author

@eddyzags eddyzags Feb 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Added here eddyzags@43838e7 (Refactor to test suite here: eddyzags@4c5bc4b)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(cc @okdas)

…ty between relay servers and backend URLs (#1)

* relayer: add RelayServers() method to RelayProxy interface; Add Ping(), ServiceIDs(), Forward() method to RelayServer interface; add RelayServers slice with helper method byServiceID

* relayer: add forward config entry

* relayer: implement ServiceIDs, Forward, and Ping method for synchrounous RPC server

* relayer: add RelayServers implementation for RelayProxy

* relayer: add Ping and Forward options

* relayer: integrate ping option

* relayer: add ServePing and ServeForward method to RelayMiner

* test proxy.Ping() in test + remove forward feature

* add serve ping test

* add doc
Copy link
Member

@Olshansk Olshansk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyzags Wasn't my intention to have this hanging for so long, but I'm glad @bryanchriswhite and you had a good back & forth to get it here.

In terms of next step:

  1. Please see @bryanchriswhite's comments
  2. See my minor NITs
  3. Merge with the latest main
  4. @okdas will review the one TiltFile / k8s related comment
  5. Can you upload a video to the github PR description showing this

Especially as PGAT is getting kicked off (and we have some large beta users), I think everyone will love this!

### Localnet Helpers ###
########################

.PHONY: localnet_relayminer1_ping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move these into a ping.mk and add an import at the obbtom of this makefile?

backend_url: http://rest:10000/
publicly_exposed_endpoints:
- relayminer1
suppliers: [] # suppliers list is dynamically defined in poktroll/Tiltfile.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want a 👍 from @okdas in case this has downstream effects on our E2E testing in ephemeral DevNets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this breaking e2e on devnets.

Comment on lines +181 to +184
Configures a `ping` server to test the connectivity of all backend URLs. If
all the backend URLs are reachable, the endpoint returns a 204 HTTP
Code. If one or more backend URLs aren't reachable, the service
returns an appropriate HTTP error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Configures a `ping` server to test the connectivity of all backend URLs. If
all the backend URLs are reachable, the endpoint returns a 204 HTTP
Code. If one or more backend URLs aren't reachable, the service
returns an appropriate HTTP error.
// ConfigurePingHandler sets up a health check server that:
// - Tests connectivity to all configured backend URLs
// - Returns HTTP 204 if all backends are reachable
// - Returns appropriate HTTP error if any backend is unreachable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyzags This LGTM but #PUC in the code with your explanation related to Ping safeguard.

You already have it written down anyhow :)

Comment on lines +40 to +43
// NewMockOneTimeRelayerProxyWithPing creates a new mock RelayerProxy. This mock
// RelayerProxy will expect a call to ServedRelays with the given context, and
// when that call is made, returnedRelaysObs is returned. It also expects a call
// to Start, Ping, and Stop with the given context.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// NewMockOneTimeRelayerProxyWithPing creates a new mock RelayerProxy. This mock
// RelayerProxy will expect a call to ServedRelays with the given context, and
// when that call is made, returnedRelaysObs is returned. It also expects a call
// to Start, Ping, and Stop with the given context.
// NewMockOneTimeRelayerProxyWithPing creates a new mock RelayerProxy that:
// - Expects a call to ServedRelays with the given context
// - Returns returnedRelaysObs when ServedRelays is called
// - Expects one call each to Start, Ping, and Stop with the given context

}()

go func() {
<-ctx.Done()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#PUC when we expect this context to close.

I'm guessing (assumption, gut intuition) is when the process shuts down, but making it explicit would be nice.

Comment on lines +149 to +151
// Start a long-lived goroutine that starts an HTTP server responding to
// ping requests. A single ping request on the relay server broadcasts a
// ping to all backing services/data nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Start a long-lived goroutine that starts an HTTP server responding to
// ping requests. A single ping request on the relay server broadcasts a
// ping to all backing services/data nodes.
// StartPingServer launches a goroutine that:
// - Creates a long-running HTTP server
// - Handles ping requests by broadcasting health checks to all backing services
// - Tests connectivity to all configured data nodes


// RelayMinerPingConfig is the structure resulting from parsing the ping
// server configuration.
type RelayMinerPingConfig struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type RelayMinerPingConfig struct {
// TODO_TECHDEBT(@red-0ne): Remove this structure altogether. See the discussion here for ref:
// https://github.com/pokt-network/poktroll/pull/1037/files#r1928599958
type RelayMinerPingConfig struct {

Comment on lines +45 to +48
relayMinerConfig.Ping = &RelayMinerPingConfig{
Enabled: yamlRelayMinerConfig.Ping.Enabled,
Addr: yamlRelayMinerConfig.Ping.Addr,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this change.

@@ -206,3 +214,22 @@ func (rp *relayerProxy) validateConfig() error {

return nil
}

// PingAll tests the connectivity between all the managed relay servers and their respective backend URLs.
func (rp *relayerProxy) PingAll(ctx context.Context) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this

Copy link
Member

@okdas okdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only checked the Tiltfile and kubernetes yaml part - looks good to me. Thank you @eddyzags!

backend_url: http://rest:10000/
publicly_exposed_endpoints:
- relayminer1
suppliers: [] # suppliers list is dynamically defined in poktroll/Tiltfile.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this breaking e2e on devnets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community A ticket intended to potentially be picked up by a community member nice-to-have Not-important and not-urgent tooling Tooling - CLI, scripts, helpers, off-chain, etc...
Projects
Status: 👀 In review
Development

Successfully merging this pull request may close these issues.

4 participants